retrieval-augmented language model
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. This paper removes these barriers by introducing LeanDojo: an open-source Lean playground consisting of toolkits, data, models, and benchmarks. LeanDojo extracts data from Lean and enables interaction with the proof environment programmatically.
Abductive Inference in Retrieval-Augmented Language Models: Generating and Validating Missing Premises
Large Language Models (LLMs) enhanced with retrieval -- commonly referred to as Retrieval-Augmented Generation (RAG) -- have demonstrated strong performance in knowledge-intensive tasks. However, RAG pipelines often fail when retrieved evidence is incomplete, leaving gaps in the reasoning process. In such cases, \emph{abductive inference} -- the process of generating plausible missing premises to explain observations -- offers a principled approach to bridge these gaps. In this paper, we propose a framework that integrates abductive inference into retrieval-augmented LLMs. Our method detects insufficient evidence, generates candidate missing premises, and validates them through consistency and plausibility checks. Experimental results on abductive reasoning and multi-hop QA benchmarks show that our approach improves both answer accuracy and reasoning faithfulness. This work highlights abductive inference as a promising direction for enhancing the robustness and explainability of RAG systems.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Abductive Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)
Structured Relevance Assessment for Robust Retrieval-Augmented Language Models
Raj, Aryan, Garg, Astitva Veer, D, Anitha
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation tasks, revolutionizing how machines interact with human language. Despite their impressive performance, these models continue to struggle with factual accuracy, often producing content that appears plausible but contains incorrect information--a phenomenon commonly referred to as "hallucination"[1]. However, despite their conceptual elegance, RALMs face several critical challenges that undermine their effectiveness in real-world scenarios. First, these systems often struggle to distinguish between relevant and irrelevant retrieved documents, treating all retrievals with equal importance regardless of their actual utility for answering the query at hand. Second, standard RALMs frequently over-rely on external retrievals even in situations where their intrinsic knowledge would be sufficient or more reliable. This rigid dependence on external sources fails to leverage the substantial knowledge already encoded in model parameters during pre-training and fine-tuning. Perhaps most concerning is RALMs' inability to acknowledge knowledge gaps when confronted with queries that cannot be answered based on either retrieved information or intrinsic knowledge. Instead of transparently communicating limitations--a crucial capability for trustworthy AI systems--these models often generate fabricated responses that appear authoritative despite lacking factual foundation.
Studying the Role of Input-Neighbor Overlap in Retrieval-Augmented Language Models Training Efficiency
Doostmohammadi, Ehsan, Kuhlmann, Marco
Retrieval-augmented language models have demonstrated performance comparable to much larger models while requiring fewer computational resources. The effectiveness of these models crucially depends on the overlap between query and retrieved context, but the optimal degree of this overlap remains unexplored. In this paper, we systematically investigate how varying levels of query--context overlap affect model performance during both training and inference. Our experiments reveal that increased overlap initially has minimal effect, but substantially improves test-time perplexity and accelerates model learning above a critical threshold. Building on these findings, we demonstrate that deliberately increasing overlap through synthetic context can enhance data efficiency and reduce training time by approximately 40\% without compromising performance. We specifically generate synthetic context through paraphrasing queries. We validate our perplexity-based findings on question-answering tasks, confirming that the benefits of retrieval-augmented language modeling extend to practical applications. Our results provide empirical evidence of significant optimization potential for retrieval mechanisms in language model pretraining.
- North America > United States (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (6 more...)
Quantifying the Robustness of Retrieval-Augmented Language Models Against Spurious Features in Grounding Data
Yang, Shiping, Wu, Jie, Ding, Wenbiao, Wu, Ning, Liang, Shining, Gong, Ming, Zhang, Hengyuan, Zhang, Dongmei
Robustness has become a critical attribute for the deployment of RAG systems in real-world applications. Existing research focuses on robustness to explicit noise (e.g., document semantics) but overlooks spurious features (a.k.a. implicit noise). While previous works have explored spurious features in LLMs, they are limited to specific features (e.g., formats) and narrow scenarios (e.g., ICL). In this work, we statistically confirm the presence of spurious features in the RAG paradigm, a robustness problem caused by the sensitivity of LLMs to semantic-agnostic features. Moreover, we provide a comprehensive taxonomy of spurious features and empirically quantify their impact through controlled experiments. Further analysis reveals that not all spurious features are harmful and they can even be beneficial sometimes. Extensive evaluation results across multiple LLMs suggest that spurious features are a widespread and challenging problem in the field of RAG. The code and dataset will be released to facilitate future research. We release all codes and data at: $\\\href{https://github.com/maybenotime/RAG-SpuriousFeatures}{https://github.com/maybenotime/RAG-SpuriousFeatures}$.
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. This paper removes these barriers by introducing LeanDojo: an open-source Lean playground consisting of toolkits, data, models, and benchmarks. LeanDojo extracts data from Lean and enables interaction with the proof environment programmatically.
Unstructured and structured data: Can we have the best of both worlds with large language models?
We are witnessing rapid advancements in the area of large language models (LLMs). A search on Google Scholar shows there are about 3,910 papers with "large language models" in the titles of the papers in 2022. As of April 12, 2023, there are already 1700 articles with LLMs in their titles. In addition to Google Scholar, we are also witnessing huge volumes of blog posts, news articles, twitter feeds, and open-source repositories around LLMs that have sprung up in recent months. Perhaps ChatGPT (released on November 30, 2022) is the epitome of this LLM revolution that truly unleashed and showcased, to the masses, the power of what has been brewing in the natural language and machine learning communities in recent years.